Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio

نویسندگان

Tejas Godambe

Sai Krishna Rallabandi

Suryakanth V Gangashetty

چکیده

Today, we can record and store large amounts of single speaker audio data, and also download it from the web. Generally, these data are prosodically rich and can therefore act as excellent candidates for building concatenative text-to-speech (TTS) systems. But transcritpions for these audio data are often not available and automatic transcriptions are error prone. In addition, these audio data contain bad acoustic (poorly articulated, noisy, inaudible, unintelligible, clipped) regions. Both above reasons can damage the resulting synthesized voice. So, pruning bad data becomes necessary. In this paper, we describe the development of two concatenative TTS systems using a lecture speech downloaded from Coursera and an audiobook downloaded from Librivox. Confidence measures such as phone posterior probability and unit duration obtained from the ASR system are used to remove bad data. Voices built using automatic transcripts are compared with those built using reference transcripts, and the effect of data pruning is investigated in terms of intelligibility and naturalness with the help of perceptual evaluation on Blizzard 2013 test corpus.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech recognition based confidence measures for building voices from untranscribed speech

Today, large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins. In addition, we can effortlessly record and store audio data such as read/lecture/impromptu speech on hand-held devices. These data are rich in prosody, provide a plethora of voices to choose from, and their availability can significantly reduce the overhea...

متن کامل

Prioritizing Audio Features Selection Using Analysis Hierarchy Process As A Mean To Extend User Control In Concatenative Sound Synthesis

User control is one of the most important heuristic principles of a system design as it gives users the freedom to choose a system’s functions and as a mean of communicating instructions to the system before performing a specific task. Existing concatenative sound synthesis systems call the need for a more flexible user control function, in particular during feature selection. This paper studie...

متن کامل

An auditory-based distortion measure with application to concatenative speech synthesis

This study presents a new auditory-based distance measure with application to concatenative speech synthesis. This measure employs the Carney auditory model to produce a feature vector related to auditory perception. For concatenative synthesis, the new measure is employed to assess perceived discontinuities at segment transitions. Evaluations using a restricted data base environment show that ...

متن کامل

EarGram: an Application for Interactive Exploration of Large Databases of Audio Snippets for Creative Purposes

This paper outlines the creative and technical considerations behind earGram, an application built as a Pure Data patch for real-time concatenative sound synthesis. The system encompasses four generative strategies that automatically re-arrange and explore a database of descriptor-analyzed sound snippets (corpus) by rules other than its original temporal order into musically coherent outputs. O...

متن کامل

Spatializing Timbre With Corpus-Based Concatenative Synthesis

Corpus-based concatenative synthesis presents unique possibilities for the visualization of audio descriptor data. These visualization tools can be applied to sound diffusion in the physical space of the concert hall using current spatialization technologies. Using CATART and the FTM&CO library for MAX/MSP we develop a technique for the organization of a navigation space for synthesis based on ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio

نویسندگان

چکیده

منابع مشابه

Speech recognition based confidence measures for building voices from untranscribed speech

Prioritizing Audio Features Selection Using Analysis Hierarchy Process As A Mean To Extend User Control In Concatenative Sound Synthesis

An auditory-based distortion measure with application to concatenative speech synthesis

EarGram: an Application for Interactive Exploration of Large Databases of Audio Snippets for Creative Purposes

Spatializing Timbre With Corpus-Based Concatenative Synthesis

عنوان ژورنال:

اشتراک گذاری